Search CORE

171 research outputs found

Impact of Online Learning on International Students’ English Language Concerns

Author: Du Jiayi
Sun Yiwei (Scarlett)
Publication venue: UR Scholarship Repository
Publication date: 12/04/2021
Field of study

With the onset of COVID-19, U.S. universities have been forced to move many, even all, courses online. At the University of Richmond, many of our international students faced visa restrictions due to COVID and were required to stay in their countries. As a result, the majority of our international students must attend their classes remotely. International students may find language to be a challenge during online learning. The purpose of our study is to learn more about how, if at all, online classes have an impact on international students’ English language concerns

University of Richmond

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Author: Chen Xie
Du Chenpeng
Guo Yiwei
Yu Kai
Publication venue
Publication date: 16/02/2023
Field of study

Although current neural text-to-speech (TTS) models are able to generate high-quality speech, intensity controllable emotional TTS is still a challenging task. Most existing methods need external optimizations for intensity calculation, leading to suboptimal results or degraded quality. In this paper, we propose EmoDiff, a diffusion-based TTS model where emotion intensity can be manipulated by a proposed soft-label guidance technique derived from classifier guidance. Specifically, instead of being guided with a one-hot vector for the specified emotion, EmoDiff is guided with a soft label where the value of the specified emotion and \textit{Neutral} is set to

\alpha

and

1-\alpha

respectively. The

\alpha

here represents the emotion intensity and can be chosen from 0 to 1. Our experiments show that EmoDiff can precisely control the emotion intensity while maintaining high voice quality. Moreover, diverse speech with specified emotion intensity can be generated by sampling in the reverse denoising process.Comment: Accepted to ICASSP202

arXiv.org e-Print Archive

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

Author: Du Chenpeng
Guo Yiwei
Shen Feiyu
Yu Kai
Publication venue
Publication date: 25/04/2023
Field of study

In this paper, we describe the systems developed by the SJTU X-LANCE team for LIMMITS 2023 Challenge, and we mainly focus on the winning system on naturalness for track 1. The aim of this challenge is to build a multi-speaker multi-lingual text-to-speech (TTS) system for Marathi, Hindi and Telugu. Each of the languages has a male and a female speaker in the given dataset. In track 1, only 5 hours data from each speaker can be selected to train the TTS model. Our system is based on the recently proposed VQTTS that utilizes VQ acoustic feature rather than mel-spectrogram. We introduce additional speaker embeddings and language embeddings to VQTTS for controlling the speaker and language information. In the cross-lingual evaluations where we need to synthesize speech in a cross-lingual speaker's voice, we provide a native speaker's embedding to the acoustic model and the target speaker's embedding to the vocoder. In the subjective MOS listening test on naturalness, our system achieves 4.77 which ranks first.Comment: Accepted by ICASSP 2023 Special Session for Grand Challenge

arXiv.org e-Print Archive

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

Author: Chen Xie
Du Chenpeng
Guo Yiwei
Yu Kai
Publication venue
Publication date: 02/04/2022
Field of study

The mainstream neural text-to-speech(TTS) pipeline is a cascade system, including an acoustic model(AM) that predicts acoustic feature from the input transcript and a vocoder that generates waveform according to the given acoustic feature. However, the acoustic feature in current TTS systems is typically mel-spectrogram, which is highly correlated along both time and frequency axes in a complicated way, leading to a great difficulty for the AM to predict. Although high-fidelity audio can be generated by recent neural vocoders from ground-truth(GT) mel-spectrogram, the gap between the GT and the predicted mel-spectrogram from AM degrades the performance of the entire TTS system. In this work, we propose VQTTS, consisting of an AM txt2vec and a vocoder vec2wav, which uses self-supervised vector-quantized(VQ) acoustic feature rather than mel-spectrogram. We redesign both the AM and the vocoder accordingly. In particular, txt2vec basically becomes a classification model instead of a traditional regression model while vec2wav uses an additional feature encoder before HifiGAN generator for smoothing the discontinuous quantized feature. Our experiments show that vec2wav achieves better reconstruction performance than HifiGAN when using self-supervised VQ acoustic feature. Moreover, our entire TTS system VQTTS achieves state-of-the-art performance in terms of naturalness among all current publicly available TTS systems.Comment: This version has been removed by arXiv administrators because the submitter did not have the authority to assign the license at the time of submissio

arXiv.org e-Print Archive

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

Author: Chen Xie
Du Chenpeng
Guo Yiwei
Ma Ziyang
Yu Kai
Publication venue
Publication date: 10/09/2023
Field of study

Although diffusion models in text-to-speech have become a popular choice due to their strong generative ability, the intrinsic complexity of sampling from diffusion models harms their efficiency. Alternatively, we propose VoiceFlow, an acoustic model that utilizes a rectified flow matching algorithm to achieve high synthesis quality with a limited number of sampling steps. VoiceFlow formulates the process of generating mel-spectrograms into an ordinary differential equation conditional on text inputs, whose vector field is then estimated. The rectified flow technique then effectively straightens its sampling trajectory for efficient synthesis. Subjective and objective evaluations on both single and multi-speaker corpora showed the superior synthesis quality of VoiceFlow compared to the diffusion counterpart. Ablation studies further verified the validity of the rectified flow technique in VoiceFlow.Comment: 4 figure, 5 pages, submitted to ICASSP 202

arXiv.org e-Print Archive

Envisioning an Inclusive Metaverse: Student Perspectives on Accessible and Empowering Metaverse-Enabled Learning

Author: Deng Chao
Du Yiwei
Haq Ehsan-Ul
Hoffman Jennifer
Hui Pan
Mogavi Reza Hadi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/05/2023
Field of study

The emergence of the metaverse is being widely viewed as a revolutionary technology owing to a myriad of factors, particularly the potential to increase the accessibility of learning for students with disabilities. However, not much is yet known about the views and expectations of disabled students in this regard. The fact that the metaverse is still in its nascent stage exemplifies the need for such timely discourse. To bridge this important gap, we conducted a series of semi-structured interviews with 56 university students with disabilities in the United States and Hong Kong to understand their views and expectations concerning the future of metaverse-driven education. We have distilled student expectations into five thematic categories, referred to as the REEPS framework: Recognition, Empowerment, Engagement, Privacy, and Safety. Additionally, we have summarized the main design considerations in eight concise points. This paper is aimed at helping technology developers and policymakers plan ahead of time and improving the experiences of students with disabilities.Comment: This paper has been accepted for presentation at the L@S 2023 conference. The version provided here is the pre-print manuscrip

arXiv.org e-Print Archive

Design of school bell automatic control system based on single-chip microcomputer

Author: Du Zongqi
Liu Zhenghua
Xia Yiwei
Yu Chuyue
Publication venue: БГУИР, РБ
Publication date: 01/01/2021
Field of study

This article introduces the basic components of the school's automatic control system, and makes a detailed introduction and comparison of the functions, application scenarios, and advantages of each part. The hardware design of the automatic control system is based on the STC89C52 single-chip control circuit as the core, supplemented by sensor circuits, clock circuits, bell circuits and human-computer interaction circuits to complete various functions. The human-computer interaction circuits include keyboard input circuits and liquid crystal display circuits. The software design of this system mainly includes sensor detection, button setting, and bell output part. The sensor detection part is composed of a temperature detection subprogram, the key setting part is composed of an independent key subprogram and a liquid crystal display subprogram, and the bell output part is composed of a voice recording and playback subprogram. The program and clock subroutine constitute

Belarusian State University of Informatics and Radioelectronics Repository

Cloning and expression of pineapple sucrosephosphate synthase gene during fruit development

Author: Dou Mei’an
Du Liqing
Mo Yiwei
Sun Guangming
wang Wei
Xie Jianghui
Zhang Xiumei
Publication venue: 'African Journals Online (AJOL)'
Publication date: 19/02/2016
Field of study

A 1132-base pairs (bp) polymerase-chain-reaction product of sucrose-phosphate synthase (SPS) (EC 2.3.1.14) from pineapple (Ananas comosus cv. Comte de paris) fruit was cloned and nominated as Ac- SPS1. The sequence encodes a putative 377 amino acids protein containing two serine conserved features that had been found in other plant SPS genes: the presence of a 14-3-3 protein special binding domain and an activated site of osmosis stress, which can been activated by phosphorylation and dephosphorylation. The Neighbour-joining tree revealed that Ac-SPS1 belonged to the first kind of sucrose phosphate synthase gene. The results indicated that, the Ac-SPS1 expression was low in the earlier period of fruit growth, then, increasing from 20 days after anthesis and gradually a falling on 40 days, reached the peak with the highest value around 70 days. The SPS activity and sucrose content reached their maximum 80 days after anthesis. It proved that the accumulation of sucrose was correlated with SPS activity and mRNA content and it maximally occurred at 10 d after SPS mRNA and activity had reached its maxima. These results indicated that Ac-SPS1 gene played a key role in sucrose accumulation during the pineapple fruit development and transcriptional activation with increase in Ac- SPS1 expression might be important regulatory events of sugar during pineapple fruit maturation.Key words: Pineapple fruit, sucrose phosphate synthase, gene cloning, expression

AJOL - African Journals Online

First Efficacy Results of Capecitabine with Anthracycline- and Taxane-Based Adjuvant Therapy in High-Risk Early Breast Cancer: A Meta-Analysis

Author: Du Yueyao
Jiang Yiwei
Lu Jinsong
Shao Zhimin
Shen Zhenzhou
Yan Tingting
Yin Wenjin
Zhou Liheng
Zhou Qiong
Publication venue: Public Library of Science
Publication date: 02/03/2012
Field of study

Background: Capecitabine is effective and indicated for the salvage treatment of metastatic breast cancer. Therefore, it is essential to evaluate the efficacy of capecitabine in the adjuvant setting. There have been two large randomized studies to determine whether patients with high-risk early breast cancer benefit from the addition of capecitabine to standard chemotherapy, but they have yielded inconsistent results. We first undertook a meta-analysis to evaluate the efficacy of the addition of capecitabine over standard treatment

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

Author: Chen Xie
Du Chenpeng
Guo Yiwei
Liang Zheng
Liu Zhijun
Shen Feiyu
Wang Shuai
Yu Kai
Zhang Hui
Publication venue
Publication date: 13/06/2023
Field of study

The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens, has been proven superior to traditional acoustic feature mel-spectrograms in terms of naturalness and robustness for text-to-speech (TTS) synthesis. Recent popular models, such as VALL-E and SPEAR-TTS, allow zero-shot speaker adaptation through auto-regressive (AR) continuation of acoustic tokens extracted from a short speech prompt. However, these AR models are restricted to generate speech only in a left-to-right direction, making them unsuitable for speech editing where both preceding and following contexts are provided. Furthermore, these models rely on acoustic tokens, which have audio quality limitations imposed by the performance of audio codec models. In this study, we propose a unified context-aware TTS framework called UniCATS, which is capable of both speech continuation and editing. UniCATS comprises two components, an acoustic model CTX-txt2vec and a vocoder CTX-vec2wav. CTX-txt2vec employs contextual VQ-diffusion to predict semantic tokens from the input text, enabling it to incorporate the semantic context and maintain seamless concatenation with the surrounding context. Following that, CTX-vec2wav utilizes contextual vocoding to convert these semantic tokens into waveforms, taking into consideration the acoustic context. Our experimental results demonstrate that CTX-vec2wav outperforms HifiGAN and AudioLM in terms of speech resynthesis from semantic tokens. Moreover, we show that UniCATS achieves state-of-the-art performance in both speech continuation and editing

arXiv.org e-Print Archive